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Abstract. Using Bayesian experimental design techniques, we have shown that for a single two- 
level quantum mechanical system under strong (projective) measurement, the dynamical parameters 
of a model Hamiltonian can be estimated with exponentially improved accuracy over offline estima- 
tion strategies. To achieve this, we derive an adaptive protocol which finds the optimal experiments 
based on previous observations. We show that the risk associated with this algorithm is close to the 
global optimum, given a uniform prior. Additionally, we show that sampling at the Nyquist rate is 
not optimal. 
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INTRODUCTION 

Quantum mechanics gives the most accurate description of many physical systems of 
interest. In turn, the most accurate characterization of a quantum device is given by its 
quantum mechanical model. Thus, efficient methods for the honest estimation of the 
distribution of parameters in a quantum mechanical model are of utmost importance, 
not only for building robust quantum technologies, but to reach new regimes of physics. 

Bayesian experimental design (see, e.g. [1]) is a methodology to ascertain the utility 
of a proposed experiment. Bayesian experimental design has been successfully applied 
to problems in experimental physics, such as in the recent examples of [2] and [3]. 
In classical theories of physics and statistics, the measurement simply reveals the state 
of the system at that instant. By contrast, quantum theory presents with the following 
physical (and conceptual) barrier: no single measurement can reveal the state. Rather, 
each potential kind of experiment admits a probability distribution from which we draw 
our data. Thus, the methodology of experimental design seems tailor-made for quantum 
theory. 

The structure of the paper is as follows. We begin by reviewing the general outline of 
Bayesian experimental design. We then apply the technique to devise an algorithm for 
the estimation of quantum Hamiltonian parameters. We show that in a particular case, 
this strategy is nearly globally optimal and demonstrate its improvement over standard 
algorithms numerically. Finally we conclude with a discussion on the applicability of 
this technique to real experiments on more complex quantum systems. 
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BAYESIAN EXPERIMENTAL DESIGN 



We assume some initial experiment E has been performed and data D has been obtained. 
The goal is to determine Pr(0|D, E), the probability distribution of the model parameters 
given the experimental data. To achieve this we use Bayes' rule 

/~,~ T-i \ Pr(D|0,£)Pr(0|£) 
Pr(0 \D,E) = , , , , 

v 1 ' ; Pr(D|£) 

where Pr(D|0,£') is the likelihood function, which is determined through the process 
of modeling the experiment, and Pr(0|2s) is the prior, which encodes any a priori 
knowledge of the model parameters. The final term Pr (D\E) can simply be thought as a 
normalization factor. 

At this stage we can stop or obtain further data. Experimental design is well suited to 
quantum theory since an arbitrary fixed measurement procedure does not give maximal 
knowledge as is often assumed in the statistical modeling of classical system. We 
conceive, then, of possible future data Di obtained from a, possibly different, experiment 
E\. The probability of obtaining this data can be computed from the distributions at hand 
via marginalizing over model parameters 

Pr(Di|£i,D,£) = Jpr(Di\e,E 1 )Pr(e\D,E)de. 

We can use this distribution to calculate the expected utility of an experiment 

U(E l )=^(Di\E u D,E)U(D u E l ), 

where U{D\,E\) is the utility we would derive if experiment E\ gave result D\. This 
could in principle be any function tailored to the specific problem. However, for scien- 
tific inference, a generally well motivated measure of utility is information gain [4]. In 
information theory, information is measured by the entropy 

U(D U E{) = J Pr(®\D l ,E 1 ,D,E)logPr(®\D h E l ,D,E)d®. 

Thus, we search for the experiment which maximizes the expected information in the 
final distribution. That is, an optimal experiment E is one which satisfies 



U(E)= max \YPr(D 1 \E h D,E)x 

J Pr(®\D u E h D,E)\ogPr(®\D u E u D,E)d®y 

APPLICATION TO SIMPLE EXAMPLE 

As an example of how to apply the Bayesian experimental design formalism to problems 
in quantum information, we consider a simple situation with a single qubit. In particular, 
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FIGURE 1. Overview of a step in the online adaptive algorithm for finding locally optimal experiments. 
Top: Method for calculating the utility function U(E), given a simulator and a prior distribution Pr(0) 
over model parameters ©. Bottom: Method for updating prior distribution with results D from chosen 
actual experiment. 



we suppose that the qubit evolves under an internal Hamiltonian 




Here (O is an unknown parameter whose value we want to estimate. An experiment con- 
sists of preparing a single known input state y m = |+), the +1 eigenstate of o x , evolving 
under the Hamiltonian H for a controllable time t and performing a measurement in the 
o x basis. This is the simplest problem where adaptive Hamiltonian estimation can be 
used and is the problem studied in reference [5]. 

In the language of Bayesian inference, the data D e {0, 1} is the outcome of the mea- 
surement. An experiment E consists of a specification of time the t that the Hamiltonian 
is on, while the model parameter © is simply co. The likelihood function is given by the 
Born rule 

Pr(D = 0\@,E) = | (+| |+) | 2 = cos 2 (®t). 
Experimental design is a decision theoretic problem based on the utility function 

U(t) = £Pr(D|?) I Pr(co\D,t)\ogPr(co\D,t)dCQ. 

D ■> 

The optimal design is any value of t which maximizes this quantity. 

We proceed by performing the optimal experiment and obtaining data D\. Using 
Bayesian inference we update our prior Pr(ft)) via Bayes' rule: 

/ x Pr(Di|fi))Pr(a>) 
Pr(<0 £>i) = v \ ' , K . 
1 1 u Pr(Di) 

If we are not satisfied, we can repeat the process where this distribution becomes the 
prior for the new experimental design step. This algorithm is depicted in figure 1. 
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Estimators, squared error loss and a greedy alternative to information 

gain 



The preceding problem had a single unknown variable. If we desire an estimate of 
the true value 0, the most often used figure of merit is the squared error loss: 

L(0,0) = |0-0| 2 . 

The risk of an estimator : {D,Di,Z)2, . . . ,Av} ^ M is its expected performance 
with respect to the loss function: 

/?(©,©) = £ Pr({D,D h D 2 ,...,D N }\ 0)L(0, 0) . 

{D,D U D 2 ,...,D N } 

For squared error loss, the risk is also called the mean squared error. The average of this 
quantity with respect to some prior Pr(0) =: n(&) is the Bayes risk of %, 

r(n,®) = J R(®,®)n(®)d®, 

and the estimator which minimizes this quantity is called a Bayes estimator. In this case 
the Bayes estimator is the mean of the posterior distribution 1 . Let us assume then that 
the estimators we choose are Bayes. Let us also choose a uniform prior for 0. Then, the 
final figure of merit is the average mean squared error (AMSE): 

r = J R(®,®)d®. 

We would like a strategy which minimizes this quantity. Non-adaptive Fourier and 
Bayesian strategies were investigated and compared to an adaptive strategy in reference 
[5]. Their adaptive strategy fits into the Bayesian experimental design framework when 
the utility is measured by the variance of the posterior distribution: 

V(D h E 1 ) = - J Pr(0|D 1 , J E 1 ,D, J E)(0 2 - i u( J D 1 , J E 1 )) 2 J0, 

where 

fi(D u Ei) = fvr(®\D h E h D,E)®d® 

is the mean of the posterior. Recall that the mean is a Bayes estimator of AMSE, so /i = 
0. For a single measurement this utility function satisfies V = —r. That is, maximizing 
the utility locally at each step of the algorithm is equivalent to minimizing the AMSE at 
each step. Hence, when using the negative variance as our utility function, the adaptive 
strategy summarized in Figure 1 is an example not only of a local optimization, but also 



1 Note that in any case where the loss function is strictly proper, i.e. is equal to zero if and only if the 
estimate is equal to the true state, the Bayes estimator is the posterior mean [6]. 
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a greedy algorithm with respect to the AMSE risk. In the future, we shall refer to this 
choice of utility function together with the local optimization algorithm as the greedy 
algorithm for this problem. 

We can write the risk of this strategy recursively as follows. Suppose at the N'th, and 
final, measurement we have the updated distribution Tt^-i- Then, the risk of the local 
strategy is 

l N (n N - l ,®)=Y,MDN\&,E N )L(@,n(D N ,E N )), 

D N 

where is the locally optimal design satisfying 

^ = argmin f J^Pr(p N \&,E N )L(&,fi(p N ,E N )))7C N -i(&)d&. 

En •' D N 

The expected risk at any other stage is 

Pr(£> n |0,4K-l(0) 



Z„(7r n _i,0) = £Pr(D M |£ M )Wi 



£ V/Pr(D„|0,£„K-i(0)J0 
where E n is, again, the locally optimal design satisfying 

£„ = argmin f Yfr{D n \®,E n )L{®,yL{D m E n )))TZ n -x{®)d®. 

E„ J n 



En J D„ 

Then, the Bayes risk of the greedy strategy is 

J h(jco,@)no(&)d&. 

Again, it is clear that the greedy algorithm is globally optimal on the final decision, as 
there is no further hypothetical data to consider. That is, the optimal solution at the N'th 
measurement is 

g N (7C N - h ®) = Y,?r(DN\®,E N )L(®,n(D N ,E N )), 

D N 

where Em is the locally optimal design satisfying 

^ = argmin f Y,Pr(I>N\®,E N )L(®,n(D N ,E N )))n N - l (®)d®. 

En •' D N 

However, the globally optimal risk at any other stage 

, n , ViWn liM ( Pr(g w |0,£ H )^-i(Q) \ 

^,9) = ^)^ ( /Pr(DlI |e,£,)^ 1 (e) < /eJ ' 

where now E n is the globally optimal design satisfying 

7? • fvn/nin^ ( Pr(D n \®,E n )7t n ^(®) \ 

E n = axgmm/ £Pr(D„|0,^ +1 (j^^y^^J Ttn-\{®)d®. 
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Then, the Bayes risk of the greedy strategy is 

j gi(7t ,®)n (®)d®. 
In general, l\ (teo, 0) 7^ gl (^0, ©)• Nor is it the case that 

y zi(flb,0)Hb(0)</0= y gi(Mb,0)OTb(0y0 

for an arbitrary prior. However, for the special case of the uniform prior, we have found 
numerically that the Bayes risk of the greedy strategy and the Bayes risk of the global 
strategy are similar enough that the greedy strategy is useful. 



Performance comparisons 

In reference [5], it was shown via simulation that the posterior variance of the greedy 
strategy is best fit by an exponentially decreasing function of N, the total number of 
measurements. In contrast, all off-line strategies decrease at best as a linear function of 
N. 

In Figure 2, we show that the local information gain optimizing algorithm also enjoys 
an exponential improvement in accuracy over naive off-line methods. Moreover, we 
show Nyquist rate sampling is unnecessary and, indeed, sub-optimal. All results stated 
are obtained using a uniform prior on [0, 1] and are computed numerically by exploring 
every branch of the the decision tree, in contrast to simulation. 

In order to be "fair" to the off-line methods, we restricted the adaptive methods to 
explore the same experimental design specifications. That is, for this particular problem, 
the adaptive algorithm was allowed to select measurement times from [0,N max n], where 
N m!a . is the total number of measurements. In principle, these methods could only do 
better with a larger design specification. 

DISCUSSION 

Summarizing, we have shown for the problem of estimating the parameter in a simple 
Hamiltonian model of qubit dynamics an adaptive measurement strategy can exponen- 
tially improve the accuracy over offline estimation strategies. Moreover, we have shown 
that sampling at the Nyquist rate is not optimal in the case of strong measurement. We 
have derived a recursive solution to the risks for both the local and global optimal strate- 
gies. Using this solution, we numerically found that the local strategy is nearly optimal 
in the special case of a uniform prior. That the greedy algorithm is nearly optimal in 
a case relevant to experiment demonstrates that an adaptive Bayesian method may be 
computationally feasible, in that an implementation need not consider all possible future 
data when choosing each experiment. 

Together, these results demonstrate the usefulness of an adaptive Bayesian algorithm 
for parameter estimation in quantum mechanical systems, especially in comparison with 
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FIGURE 2. Performance of the estimation strategies. The Bayesian sequential and the strategy labeled 
"Nyquist" sample at the Nyquist rate. The "optimized" strategies find the global maximum utility (using 
Matlab's "fmincon" starting with the optimal Nyquist time). In each case, N max = 12 measurements are 
considered. Left: the ideal model discussed in the text. Right: a more realistic model with 25% noise and 
an addition relaxation process (known as 7^) which exponentially decays the signal (to half its value at 
t = l0x). 



other algorithms in common use. In the presence of noise, this improvement becomes 
still more stark, as demonstrated by the results shown in Figure 2. 

Why is it the case that the Nyquist times are not optimal? First, why should we 
expect them to be optimal? The Nyquist theorem states that a signal which contains 
no frequencies higher than £0 max is completely and unambiguously characterized by a 
discrete set of samples taken at a rate greater than or equal to ^/2co max . However, the 
classical notion of sampling fails for the strong-measurement case that we consider here. 
What we have is a periodic probability distribution which can be sampled, not a periodic 
function whose values can be ascertained. That is there is no signal, in the classical 
sense of the word, which can be reconstructed. The failure of the Nyquist rate sampling 
is exemplified in Figure 3. 

In this paper, we have chosen to measure success via the squared error loss. Although 
this is a standard metric, note that it is not practically useful in the context of estimating 
the parameters of a quantum mechanical system. We motivate this claim as follows. A 
typical application of our algorithm is to inform control theory algorithms, which can 
achieve significantly higher fidelities if given a distribution over Hamiltonians rather 
than a single best estimate. Indeed, in the case of nuclear magnetic resonance, the 
physical ensemble of qubits produces a real distribution of Hamiltonians to which 
control theory algorithm must be robust against [7, 8]. Any single estimate of the 
Hamiltonian parameters will thus artificially exclude dynamics which will appear as 
decoherence in the resultant pulses. Thus, we must measure the success of our algorithm 
via a loss function of the true distribution and estimated posterior. Noting that relative 
entropy is broadly considered the correct loss function for probability estimators, our 
algorithm, which maximizes expected information gain, becomes the optimal solution. 

We expect that in more complicated systems, the Bayesian adaptive method will 
remain useful, especially in applications such as optimal control theory, where having a 
distribution over Hamiltonians is significantly more useful than a single best estimate. 
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FIGURE 3. The information gain (left) and variance (right) utilities for the prior followed by three 
simulated measurements. The vertical grid lines indicate the Nyquist times. Note that the times at which 
the utilities are maximized do not necessarily increase with the number of measurements. 
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